Requirements (CRUD)
Functional
-
1. User can Read the feed.
2. User can Create the feed(images, text, videos)
Non Functional
-
1. Update the feed
2. Delete the item from feed
3. News feed keep on extending when it gets over
Back of Envelope
World Population | InternetUsers(60%) | FB Users(80-85% of Internet users) | Active FB Users(20%) ----------------------|-----------------------|------------------------------------|-------------------- 7 Billion //Year 2020 | 7 x 0.6 = 4.2 Billion | 4.2 x 0.8 = 3.5 Billion | 3.5 x .2 = 700 Million
Traffic Estimates (Incoming Requests/sec)
700 Million pulling their new feed 5 times a day. Total New feed requests = 700 x 5 = 3500 Million requests/day. 3500/24x60x60 = Around 39,000 Requests/sec.
Storage Estimates
Cache/CDN Suppose We want to keep 500 posts/user on CDN for quick fetch. Total posts to be stored. 500 x 700 Million = 350 Billion. Let's assume on average each posts is 1 KB in size. Total bytes = 350 TB Assuming 1 server can cache is 100GB. Total servers needed = 350 TB / 100 GB = 3500 Machines needed in total
API Design
1. Create Record
POST https://url/v1/post_feed
header {
Authorization: {Bearer "API_KEY_TOKEN"},
}
body {
heading, description, picture_url, video_url
}
2. Reading the Feed
GET HTTP/2 https://url/v1/read_feed
header {
Authorization: {Bearer "API_KEY_TOKEN"},
}
body {
to be filled
}
DB Schema
SQL DB
User information, password, comments can be stored in seperate tables. Likes to Comments can be stored in seperate table and comments table can store like_id.
User Table Password Table
| username | user_id(pk) | created_at | enabled | | user_id(pk) | password_hash | created_at | expiry_at |
Comments Table: Comment can have text,video,image,@,#,likes,shares
| comment_id (pk) | text_id(fk) | user_id (fk) | media-id(fk) | created_at | like_id(fk) | share_id(fk) | ...
id1 t1 u1 m1 - l1 s1
Text Table likes Table Media Table (Photo, text, video)
| key=text_id (pk) | value=text url | | key=like_id (pk) | value=file url storing all likes | | key=text_id (pk) | value=text url |
t1 hello l1 (user2, user3, user4 ..)
>> Friends & Friends of Friends stored in Graph DB
Problems in Above design (likes table)
- 1. Race Condition, when number of likes on comment grows in Millions
This can cause race condition (if 100's of users try to like same comment at same time).
Solution-1 (Key-Value DB: Redis)
-
Seperate row for each like. Key-Value DBs can handle huge datasets with fast lookups
Precomputing the number of likes.
likes Table
SADD comment_id1 user1
SADD comment_id1 user2
SADD comment_id1 user3
SADD comment_id2 user2
SREM comment_id1 user2 # Unlike
SCARD comment:id1:likes # Get like count
Solution-2 (Graph DB(neoj4))
-
In a graph database, you can model User and Comment as nodes and the LIKES relationship as an edge.
Likes = Number of edges
user-1 ----authored--> comment-1
/\
user2 ----------liked------------|
HLD
JWT based authentication to facebook.com
POST Message
The system limits the number of posts a user can make within a certain period, vital to prevent spam and abusive content
@startuml
participant FBApp as app
box AWS
participant LoadB as lb
participant AppServer as as
participant Cache as cache
participant GraphDB as db
end box
hnote over app
User authenticated
Have bearer_token
end note
hnote over as
Authentication,
Rate Limiting
end note
app -> lb: HTTP POST facebook.com/v1/feed\n...Headers...\nAuthotization=1234\ncontent_type\ncontent_len\n\n...BODY...\ntext:"Hello"
lb -> as: HTTP POST "Hello"
as -> cache: "Hello"
cache -> db: "Hello"
@enduml
GET Newsfeed
Push Model | Pull Model | |
---|---|---|
What | Whenever someone posts a item(text, video, audio), Generate newsfeed for all friends/followers and send to friend/follower. Client side browser caches it and delivers when client comes online | Only Generate feed whenever request to retrieve comes from client |
Advantage, Disadvatange |
Adv: Newfeed is calculated in advance, which leads to very less delay on client to read news feed Disadv: For inactive users/friends also newfeed is calculated. More resources are used upfront |
Adv: Newfeed is calculated for active users only, which leads to low resource usage Disadv: Client can get significant latency in receiving newsfeed as its calculated when client asks |
Usecase | we use a push model for the majority of users | Celebrity Case: Whenever a celebrity posts a message. Feed for its followers is calculated and delivered. |
What we will do?
-
We will use both of approaches based on the situation. See usecase in above table
Newfeed Architecture
@startuml
!pragma teoz true
actor user as u
control "App\nService" as as #LightBlue
control "Fanout\nService" as fo #LightBlue
queue "Message\nQueue" as mq
control "Workers" as w #LightBlue
box userdb #Azure
entity UserCache as cudb #LightPink
database UserDB as udb #LightGreen
end box
box postdb #Azure
entity "Cache\nPostDB" as cpdb #LightPink
database "PostDB\nGraphDB" as pdb #LightGreen
end box
box newsfeed #Azure
entity "Cache\nNewsfeed" as cnf #LightPink
control "Newsfeed\nService" as nfs #LightBlue
end box
u -> as: POST "Hello"
as -> cpdb: "Hello"
cpdb -> pdb: "Hello"
as -> fo: userId=1\nCreated POST
group GET Friends of userId=1
fo -> cudb: Get friends\nof userid=1
cudb -> udb:Get friends\nof userid=1
udb -> fo: friend list
end
group GET User Filters
fo -> cudb: Get userid\nfilters
cudb -> udb:Get userid\nfilters
udb -> fo: filters(if\nuser has muted someone)
end
fo -> mq: friendId + PostId
mq -> w: data
w -> cnf: data
cnf -> nfs: data
nfs -> pdb: Get post
pdb -> nfs: post
hnote over nfs
Create post
username, profile
picture,post content
post image
end note
nfs -> u: news feed
@enduml